SQL-AG: Querying structured documents using attribute grammars

نویسندگان

  • Jan Van den Bussche
  • Stijn Vansummeren
  • Dieter Vrancken
چکیده

Structured documents, such as program source texts, technical documentation, or XML data, comprise an important class of data in many applications. Structured documents are distinguished from flat text by their tree structure. In a program source text, this structure is the abstract syntax tree of the program. In a technical document, this structure is the division in chapters, sections, paragraphs, definitions, examples, footnotes, and so on. In an XML document, this structure is the tree structure provided by the tag markup. End-users may be satisfied with simple keyword-type searches over structured documents. Application developers, however, need to be able to perform more powerful types of searches and transformations of structured documents, where the structure is explicitly taken into account. For example, in a Java source file, we may want to look up all method names. In a technical document, we may want to move all footnotes to an appendix. On XML data, structural queries and transformations are the bread and butter of special-purpose XML manipulation languages such as XSLT or XQuery. Indeed, XSLT and XQuery are existing valuable tools for expressing structural querying of structured documents. Nevertheless, there are interesting alternatives which merit to be explored. Specifically, one such alternative is our SQL-AG system. In a nutshell, in SQL-AG, the tree structure is defined by a context-free grammar. A structured document then is viewed as a parse tree according to this context-free grammar. The document is stored in a relational database. Structured queries are now expressed by annotating the context-free grammar with attribute definitions: this is the standard attribute grammar mechanism. In SQL-AG, attribute values are relations over the nodes and text values of the document. Attribute definitions are expressed in standard SQL. SQL-AG is not meant as an ad-hoc query language, but rather as a powerful tool for application developers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Retrieval from Structured Documents Represented by Attribute Grammars

This paper presents a system for Information Retrieval (IR) from collections of structured documents represented by Attribute Grammars (AG). Each document corresponds to a syntactic tree with nodes decorated with sets of attributes. The values of these attributes correspond to characteristics which specify the semantics of the textual content and the structure in order to perform IR. First, we ...

متن کامل

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

This paper presents an ongoing work to uniformly represent structured documents by mean of Attribute Grammars (AG). Each document corresponds to a syntactic tree with nodes decorated with sets of attributes. The values of these attributes correspond to characteristics which specify the semantics of both the textual content and the structural elements. We show how to use this representation for ...

متن کامل

ReQueSS: Relational Querying of Semi-Structured Data

We present a prototype of a Web querying interface which is capable of searching and querying unified Web sources of data that have sufficient hidden relational structure. The system converts query-related parts of Web pages into relational data and provides for SQL-like or QBE-like querying capability. The relational query is parsed for relevant information such as selection conditions and tab...

متن کامل

Standardizing the Querying Process with SGML The SQL DTD

One of the most exciting applications of SGML which has emerged in the recent years is its use in document databases. The structural information embedded in SGML documents makes it possible to query SGML documents and extract information in an automatic manner; however, this querying process has not been standardized. As a result, different SGML database implementations use their own query lang...

متن کامل

Attribute grammars for unranked trees as a query language for structured documents

Document specification languages, like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003